99 research outputs found

    Physiological and acoustic characteristics of the female music theater voice

    Get PDF
    International audienc

    Effect of being seen on the production of visible speech cues. A pilot study on Lombard speech

    No full text
    International audienceSpeech produced in noise (or Lombard speech) is characterized by increased vocal effort, but also by amplified lip gestures. The current study examines whether this enhancement of visible speech cues may be sought by the speaker, even unconsciously, in order to improve his visual intelligibility. One subject played an interactive game in a quiet situation and then in 85dB of cocktail-party noise, for three conditions of interaction: without interaction, in face-to-face interaction, and in a situation of audio interaction only. The audio signal was recorded simultaneously with articulatory movements, using 3D electromagnetic articulography. The results showed that acoustic modifications of speech in noise were greater when the interlocutor could not see the speaker. Furthermore, tongue movements that are hardly visible were not particularly amplified in noise. Lip movements that are very visible were not more enhanced in noise when the interlocutors could see each other. Actually, they were more enhanced in the situation of audio interaction only. These results support the idea that this speaker did not make use of the visual channel to improve his intelligibility, and that his hyper articulation was just an indirect correlate of increased vocal effort

    Physiological and acoustic characteristics of the female musical theater voice in ‘belt’ and ‘legit’ qualities

    Get PDF
    ABSTRACT A study was conducted on six female Music Theatre singers. Audio and Electroglottographic (EGG) signals were recorded simultaneously with the vocal tract impedance while the singers produced sustained pitches on two different qualities ('chesty belt', 'legit'). For each quality, two vowels (/Ε/, /o/) were investigated, at four increasing pitches over the F#4-D5 range (~370-600 Hz). Measured values of glottal parameters (Open Quotient, Amplitude of the EGG signal) support the idea that 'chesty belt' is produced in the first laryngeal mechanism (M1) and 'legit' in the second one (M2). The frequency of the first vocal tract resonance (R1) was found to be systematically higher in 'chesty belt', close to the second voice harmonic (2f 0 ). These observations were consistent with greater intensities and energy above 1 kHz in 'chesty belt' compared to 'legit'

    Diverse resonance tuning strategies for women singers

    No full text
    International audienceOver the range 200 to 2000 Hz, the fundamental frequency f0 of women's singing voices covers the range of the first two resonances (R1 and R2) of the vocal tract. This allows diverse techniques of resonance tuning. Resonances were measured using broadband excitation at their lips. A commonly noted strategy, used by sopranos, and some altos, is to tune R1 close to the fundamental frequency f0 (R1:f0 tuning) once f0 approached the value of R1 of that vowel in speech. At extremely high pitch, sopranos could no longer increase R1 sufficiently and switched from R1:f0 to R2:f0 tuning. At lower pitch many singers of various singing styles found it advantageous to use R1:2f0 tuning Additionally, many sopranos employed R2:2f0 tuning over some of their range, often simultaneously with R1:f0 tuning

    Diverse resonance tuning strategies for women singers

    No full text
    International audienceOver the range 200 to 2000 Hz, the fundamental frequency f0 of women's singing voices covers the range of the first two resonances (R1 and R2) of the vocal tract. This allows diverse techniques of resonance tuning. Resonances were measured using broadband excitation at their lips. A commonly noted strategy, used by sopranos, and some altos, is to tune R1 close to the fundamental frequency f0 (R1:f0 tuning) once f0 approached the value of R1 of that vowel in speech. At extremely high pitch, sopranos could no longer increase R1 sufficiently and switched from R1:f0 to R2:f0 tuning. At lower pitch many singers of various singing styles found it advantageous to use R1:2f0 tuning Additionally, many sopranos employed R2:2f0 tuning over some of their range, often simultaneously with R1:f0 tuning

    Converging toward a common speech code: imitative and perceptuo-motor recalibration processes in speech production

    Get PDF
    International audienceAuditory and somatosensory systems play a key role in speech motor control. In the act of speaking, segmental speech movements are programmed to reach phonemic sensory goals, which in turn are used to estimate actual sensory feedback in order to further control production. The adult's tendency to automatically imitate a number of acoustic-phonetic characteristics in another speaker's speech however suggests that speech production not only relies on the intended phonemic sensory goals and actual sensory feedback but also on the processing of external speech inputs. These online adaptive changes in speech production, or phonetic convergence effects, are thought to facilitate conversational exchange by contributing to setting a common perceptuo-motor ground between the speaker and the listener. In line with previous studies on phonetic convergence, we here demonstrate, in a non-interactive situation of communication, online unintentional and voluntary imitative changes in relevant acoustic features of acoustic vowel targets (fundamental and first formant frequencies) during speech production and imitation. In addition, perceptuo-motor recalibration processes, or after-effects, occurred not only after vowel production and imitation but also after auditory categorization of the acoustic vowel targets. Altogether, these findings demonstrate adaptive plasticity of phonemic sensory-motor goals and suggest that, apart from sensory-motor knowledge, speech production continuously draws on perceptual learning from the external speech environment

    Plasticity of sensory-motor goals in speech production: behavioral evidence from phonetic convergence and speech imitation

    Get PDF
    International audienceImitation is one of the major processes by which humans develop social interactions. In speech communication, imitative processes are used from birth to adulthood, as highlighted by children’s mimicking abilities and by adult’s tendency to automatically “imitate” a number of acoustic-phonetic characteristics in another speaker’s speech. These adaptive changes are thought to play a key role in speech development/acquisition and to facilitate conversational exchange by contributing to setting a common perceptuo-motor link between speakers. Based on acoustic analyses of speech production in various laboratory tasks, the present study aimed to better characterize sensory-to-motor adaptive processes involved in unintentional as well as voluntary speech imitation, and to test possible motor plastic changes due to auditory-motor recalibration mechanisms

    Make That Sound More 'Metallic': Towards a Perceptually Relevant Control of the Timbre of Synthesizer Sounds Using a Variational Autoencoder

    Get PDF
    In this article, we propose a new method of sound transformation based on control parameters that are intuitive and relevant for musicians. This method uses a variational autoencoder (VAE) model that is first trained in an unsupervised manner on a large dataset of synthesizer sounds. Then, a perceptual regularization term is added to the loss function to be optimized, and a supervised fine-tuning of the model is carried out using a small subset of perceptually labeled sounds. The labels were obtained from a perceptual test of Verbal Attribute Magnitude Estimation in which listeners rated this training sound dataset along eight perceptual dimensions (French equivalents of 'metallic, warm, breathy, vibrating, percussive, resonating, evolving, aggressive'). These dimensions were identified as relevant for the description of synthesizer sounds in a first Free Verbalization test. The resulting VAE model was evaluated by objective reconstruction measures and a perceptual test. Both showed that the model was able, to a certain extent, to capture the acoustic properties of most of the perceptual dimensions and to transform sound timbre along at least two of them ('aggressive' and 'vibrating') in a perceptually relevant manner. Moreover, it was able to generalize to unseen samples even though a small set of labeled sounds was used

    The listening talker: A review of human and algorithmic context-induced modifications of speech

    Get PDF
    International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output

    Communiquer en environnement bruyant :<br />de l'adaptation jusqu'au forçage vocal

    No full text
    This work originates from the observation that some people are more likely than others to develop voice disorders when they often have to communicate in adverse conditions such as noisy environments. This thesis aims at understanding what induces these differences, in order to prevent vocal straining.To account for these differences in sensibility, previous studies have essentially focused on the variability in physiological constitutions as well as on the environmental factors people subject to vocal straining are confronted with. The work presented here rather explores the hypothesis that these differences could also result from an adaptation behaviour of the speaker to the particular communicative situation.Firstly, we examined how the different acoustic and articulatory speech modifications can be interpreted as indicators of communicative strategies adopted by the speaker to emerge from the ambient noise, to facilitate the audiovisual perception of phonetic units, to enhance the prosodic cues of discourse structure or to highlight words which are more informative. Several arguments are put forward for considering speech adaptation in noise not only as vocal, inflicted and reflex but also as cognitive, communicational and monitored by the speaker. In order to assess this point, two main databases of semi-spontaneous speech were recorded for which speakers were playing interactive games in noise played over loudspeakers. These methodologies were preliminarily assessed considering our hypotheses on vocal straining and speech adaptation in noisy environments.The inter-individual comparison of adaptation behaviours lead us to point out some differences in the combined use of theses different communicative strategies –not equivalent in terms of load for the larynx– and in the reorganization of adaptation strategies for varying noise types and levels. Finally these varying adaptation behaviours were related to different laryngeal profiles. Therefore these results offer a new perspective for the study and prevention of vocal straining.Ce travail part de l'observation que certaines personnes sont plus susceptibles que d'autres de développer des troubles de la voix lorsqu'elles sont amenées à communiquer fréquemment dans des conditions perturbées telles que des environnements bruyants. Le but de cette thèse est de comprendre l'origine de ces différences afin de prévenir le forçage vocal.Jusqu'à maintenant, l'explication de ces différentes sensibilités a été principalement cherchée du côté de la variabilité des constitutions physiologiques et des facteurs environnementaux auxquels sont confrontées ces personnes. Au cours de ce travail, nous allons plutôt explorer l'hypothèse que ces différences puissent également provenir du comportement d'adaptation du locuteur à la situation de communication. C'est pourquoi nous avons tout d'abord examiné en quoi les différentes modifications acoustiques et articulatoires peuvent être interprétées en terme de stratégies de communication visant à émerger du bruit ambiant, à faciliter la reconnaissance audiovisuelle des unités phonétiques pour l'interlocuteur, à renforcer les indices prosodiques de structuration de l'énoncé ou encore les mots de l'énoncé portant le plus d'information. Nous avons apporté plusieurs arguments en faveur d'une adaptation de la parole dans le bruit qui ne serait pas uniquement vocale, réflexe et subie, mais également en partie intentionnelle, communicationnelle et gérable par le locuteur. Pour cela, nous avons enregistré deux principales bases de données constituées de parole semi-spontanée en interaction avec un interlocuteur et pour une immersion des locuteurs dans le bruit par le biais de haut-parleurs. Nous avons au préalable montré l'apport de ces méthodologies pour pouvoir tester nos hypothèses sur le forçage vocal et sur l'adaptation de la parole dans le bruit.L'examen individuel des comportements d'adaptation de la parole dans le bruit nous a permis de mettre en évidence des différences au niveau de l'adoption combinée des différentes stratégies de communication –celles-ci n'étant pas équivalentes vis à vis de la charge qu'elles font porter au larynx– ainsi qu'au niveau de la réorganisation de l'adaptation en fonction du contexte (type et niveau de bruit). Nous avons pu relier ces variabilités comportementales à des profils laryngés différents, ouvrant en cela de nouvelles perspectives pour l'étude et la prévention du forçage vocal
    corecore